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Abstract. We consider the nonparametric regression with a random design model, and 
we are interested in the adaptive estimation of the regression at a point xq where the 
design is degenerate. When the design density is /3-regularly varying at xq and / has a 
smoothness s in the Holder sense, we know from Gaiffas (2004) that the minimax rate 
is equal to £(l/n) where £ is slowly varying. In this paper we provide an 

estimator which is adaptive both on the design and the regression function smoothness 
and we show that it converges with the rate (logn/n)'’/^^''’^'’'''^^t!(logn/n). The procedure 
consists of a local polynomial estimator with a Lepski type data-driven bandwidth selector 
similar to the one in Goldenshluger and Nemirovski (1997) or Spokoiny (1998). Moreover, 
we prove that the payment of a log in this adaptive rate compared to the minimax rate is 
unavoidable. 


1. Introduction 

1.1. The model. We observe n pairs of random variables {Xi,Yi) G M x M independent 
and identically distributed satisfying 

y, = /(A,) + e„ (1.1) 

where / : [0,1] —> M is the unknown signal to be recovered, the variables (^j) are centered 
Gaussian with variance <7^ and independent of the design Vi,..., An- The variables A* are 
distributed with respect to a density p. We want to recover / at a fixed point xq. 

The classical way to consider the nonparametric regression model is to take Xi = ijn. In 
this model with an equispaced design the observations are homogeneously distributed over 
the unit interval. If we take the Xi random we can modelize cases with inhomogeneous 
observations as the design distribution is "far" from the uniform law. We allow here the 
density p to be degenerate (vanishing or exploding) and we are more precisely interested in 
the adaptive estimation of / at a point where the design is degenerate, namely a point with 
very inhomogeneous data. 

1.2. Motivations. The adaptive estimation of the regression function is a well-developed 
problem. Several adaptive procedures can be applied for the estimation of a function with 
unknown smoothness: nonlinear wavelet estimation (thresholding), model selection, kernel 
estimation with a variable bandwidth (the Lepski method), and so on. 
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Recent results dealing with the adaptive estimation of the regression function when the 
design is not equispaced or random include Antoniadis et al. (1997), Brown and Cai (1998), 
Wong and Zheng (2002), Maxim (2003), Delouille et al. (2004), Kerkyacharian and Picard 
(2004), among others. A natural question arises: what happens if we want to estimate 
adaptatively the regression function at a point where the design is degenerate? In Gaiffas 
(2004) we proved when fi varies regularly at xq that the minimax convergence rate il^n over 
a Holder type regularity class with smoothness s > 0 (around xq) satisfies 

ipn i{l/n) as n ^ +cxd, 

where (3 is the regular variation index of ^ at xq (see definition El and (. is slowly varying 
(the notation a„ x bn means 0 < liminf ^ limsupon/ftn < +oo). For the proof of the 

upper bound, a (non adaptive) linear procedure was used. 

The next logical step is then to find a procedure able to recover / with as less prior 
knowledge as possible on its smoothness and on the design density. On pointwise adaptive 
curve estimation (in the regression or the white noise model) see Lepski (1990), Lepski and 
Spokoiny (1997), Spokoiny (1998) and Brown and Cai (1998) for wavelet methods. 

1.3. Organisation of the paper. We introduce the estimator in sectionEJ In section|31we 
give upper bounds for this procedure conditionally on the design, see theorem ^ and in the 
regular variation framework, see theorem O In section 0] we prove that the obtained conver¬ 
gence rate is optimal, see theorem (31 and its corollary. We present numerical illustrations in 
section El for several datasets and we discuss in detail some points in section n Section m is 
devoted to the proofs and we recall some well-known facts on regularly varying functions in 
appendix. 


2. The procedure 

2.1. Local polynomial estimation. Let k G N and h > 0 (the bandwidth) . We define 
3^n,h — such that Xi G [xq — h, xq + h]}, 

and we introduce the pseudo-scalar product 

^'n,h I „ I ,, 

I ^01 

and II • ||/i the corresponding pseudo-norm. Let (l)j{x) = (x — xq)-^ for j = 0,..., k. We 
introduce the matrix X/j and the vector Y/^ with entries for 0 ^ j, I ^ k: 

(Xh)j,i = {(pj , (j)i)h and (yh)j = {Y, 4ij)h. (2.1) 

Definition 1. Let 

f + 0h,i(j)i H-h 0h,n(l>K when Nn,h > 0, 

\ 0 when Nn,h = 0, 

where 9h is the solution of the linear system 

Xhd = Yh, ( 2 . 2 ) 

where 

Xh = Xh + 

with A(M) standing for the smallest eigenvalue of a matrix M and I^+i the identity matrix 


fh^K 


in 
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This procedure is slightly different from the classical version of the local polynomial 

^ /2 

estimator. We note that the correction term in X/^ entails A(Xft) ^ h ■ local 
polynomial estimation, see Stone (1980), Fan and Gijbels (1995, 1996), Spokoiny (1998) 
and Tsybakov (2003) among many others. 

2.2. Adaptive bandwidth selection. The procedure selects the bandwidth /i in a set Ti 
called the grid, which is a tuning parameter of the adaptive procedure. We can choose either 
an arithmetical or a geometrical grid 


n = 


[{n-2)/a] 


'T/arith _ 


U {^ 2 +h} for a ^ 1, or 


i=l 
[logo n] 


^geom ^ y for a > 1, 


2=1 

where hi = |X(j) — xo| and where |X(j) — xo\ ^ — xo| for any i = 1,... ,n — 1. Note 

that [x] stands for the integer part of x. We define 

Idh = {h ^ 7i such that h ^ h}. 

The bandwidth is selected as follows: 

Hn = max|/i G G VO < j ^ k, - fh',K , </>i)h'| < CF\\^j\\h'Tn,h',h^, (2.3) 

where fh^n is given by definition ^ and where the threshold Tn^h',h is equal to 
CnJCpN-l, log + J{N^,h-a)-nogn if W = Wf 




, Cn^JCpN-l, log Nn,h + Y^(l + o)N-l log n if?f = ?fr”. 


(2.4) 


with Ck = 1 + ^/k + 1, Cp = 8(1 + 2p) where p fits with the loss function in (HU and 
the grid parameter. The estimator is then 

fn{xo) = fH„,^{xo)- 


a IS 


(2.5) 


The selection rule (HHI) is similar to the method by Lepski, see Lepski (1990), Lepski et al. 
(1997) and Lepski and Spokoiny (1997) and is additionally to the original Lepski method 
sensitive to the design. This procedure is close to the one in Spokoiny (1998). See section 
lO for more details on existing procedures in the literature. 


3. Upper bounds 

We measure a procedure fn performance over a class S (to be specified in the following) 
with the maximal risk 

T/p 


( sup E] {\fnixo) - fixo)\P}) 
/GS 


(3.1) 


where xq is the estimation point and p ^ 1. The expectation in (HU is taken with 
respect to the joint law of the observations dEU. 
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3.1. Regular variation. The regular variation definition and main properties are due to 
Karamata (1930). On this topic we refer to Senata (1976), Geluk and de Haan (1987), 
Resnick (1987) and Bingham et al. (1989). 

Definition 2 (Regular variation). A continuous function u : M"'' —> M"'' is regularly varying 
at 0 if there is a real number /3 G M such that 

Vy > 0, lim v{yh)/v{h) = . (3.2) 

h —^0*^ 

We denote by RV(/3) the set of all such functions. A function in RV(0) is slowly varying. 

Remark. Roughly speaking, a regularly varying function behaves as a power function times 
a slower term. Typical examples of such functions are x^, {log{l/x)y and more generally 
any power function times a log or compositions of log to some power. For other examples, 
see in the references. 

Definition 3. If h > 0 and oj G RV(s) with s > 0 we define the class of all 

functions / : M —> M such that 

V/i ^ (5, inf sup \f{x) — P{x — xo)\^uj{h), 

^^'Pk |a;— 

where k = [sj (the largest integer smaller than s) and Vk is the set of all the real polynomials 
with degree k. We define £uj{h) = the slow variation term of a;. If a > 0 we define 

ll{a) = {/ : [0,1] ^ M such that ||/||oo < a}- 

Finally, we define 

S 5 ,a(a:o,a;) = Ps{xo,u}) r\l({a). 

Remark. If uj{h) = rh^ for r > 0 we find back the classical Holder regularity with radius r. 
In this sense, the class iFs{xo,u}) is a slight Holder regularity generalisation. 

3.2. Conditionally on the design. When nothing is known on the design density be¬ 
haviour we can work conditionally on the design. Let be the sigma-algebra generated by 
A:i,...,A:„. We define 

Hn,oj — min|/i G [0,1] such that ui{h) ^ (3-3) 

which is well defined for n large enough (when w(l) ^ uy^logn/n). The quantity Hn^ui 
makes the balance between the bias and the log-penalised variance of fh,K (see lemma ^ 
and therefore can be understood as the ideal adaptive bandwidth, see Lepski and Spokoiny 
(1997) and Spokoiny (1998). The log term in 113.311 is the payment for adaptation, see section 
H~n Let us define 

^ max{h G n\h ^ 

and 

We define the diagonal matrix A/j = diag(||(/>o||)C^,..., ||(/)k||) 0^), the symmetrical matrix 
Qh = AhlKhAh and We define the event 

fl/i = {Wi ,... ,Xn are such that A(X/i) > and Nn,h ^ 2}. (3.5) 

We note that Qh G An and X/^ is invertible on Qh- The next result shows that, conditional 
on Xn, fnixo) = ffj Converges with the rate Rn,uj simultaneously over any S(xo,a;) 

when OJ G RV(s) with 0 < s ^ k -|- 1. 
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Theorem 1. If uj £ RV(s), 0 < s ^ ac + 1 and a > 0 we have for any re ^ k + 1 on Qh*^' 
sup E]^^{R~PJfnixo) - f{xo)\P\Xn} ^ Cl A“P + C 2 (« V If (logre)”^/^ 

where ci = ci{p, k, a) and C 2 = C 2 {p, n, a, a). 

We will see that the probability of the event „ is large and that A„,,aj is positive with 
a large probability when the design density is regularly varying (see lemma inj. Note that 
the upper bound in theorem Q is non asymptotic in the sense that it holds for any re ^ re +1. 
The random normalisation Rn^u) is similar to the one in Guerre (1999), see section lOl for 
more details. 


3.3. Regularly varying design. 


Definition 4. For /3 > — 1 and a neighbourhood W of xq we define 

IZ{xo,P) = {/i density such that G RV(/3)Vx G W, p{x) = re(|x — xo|)}. 

We assume in all the following that p G TZ(xo,(3) for /3 > — 1. Let hn,uj be the smallest 
solution to 


ui{h) 


= a 


/ 


log re 

2re Jq i'{t)dt 


(3.6) 


and 


rn,iO = (3.7) 

Equation {HSl) can be viewed as the deterministic counterpart to the equilibrium in (IH3I). 
We define Ca 3 — (^ + (—1)") and the matrix Q with entries {Q)ji = —£dM£= for 

a-hp-hi y/^2j,l3C2l,l3 

0 ^ ^ K and Ak ,/3 = A(^). It is easy to see that Ak ,/3 > 0. 


Theorem 2. If 

• K £ N, P > —1, a > 0 and Q > I, 

• oj £ RV(s) for 0 < s ^ K + 1, 

then the estimator fn{xo) = f^^R (a^o) with the grid H = satisfies 

\fp£lZ{xo,P) limsup sup |/„(xo) -/(xo)|^} ^ CA”^, (3.8) 

where C = C{p, k) . Moreover, we have 

rn,uj ~ (logre/re)*/(^+^*+^)£^,j.(logre/re) as n ^ + 00 , (3-9) 

where iu},u is slowly varying. 

Remark. When uj{h) = rh^ (Holder regularity) we have more precisely 

Xn,u) ~ (log re/re)^/^^’''^^''“^^G,i/(log re/re) as re ^ +oo. 

Note that G(/i) = ^,y{h\og{l/h)) is also slowly varying, thus fi(l/re) = (logre/re) is a 
slow term. 
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3.4. Convergence rates examples. Let /3 > —1, r,s be positive and 0,7 be any real 
numbers. If we take u such that v{t)dt = h^+^(log(l//i))“ and uj{h) = r/i^(log(l//i))'^ 
then we find that (see section o for the computation details) 

Tn,. ~ (3.10) 

where ~ bn mean lim„^+oo o^nlbn = 1- This rate has to be compared with the minimax 
rate from Gai'ffas (2004): 

^2sl (l+2s+/3)^(/3+l)/(1+2S+/3) (l+2s+/3) ^ 

where the only difference is the a instead of a — 1 in the log exponent. This loss is the 
payment for adaptation and is unavoidable in view of theorem 0 ^md its corollary. See 
section 0] for more details. 

In the classical case, namely when the design is non-degenerate and / is Holder {uj{h) = 
rh^ and a = /3 = 7 = 0) we find the usual pointwise minimax adaptive rate (see Lepski 
(1990), Brown and Low (1996)): 

a2^/(i+2^)ri/(i+2«)(logn/n)^/(i+2^). 

When the design is again non-degenerate and the continuity modulus is equal to uj{h) = 
r/i®(log(l//i))“^ we find a convergence rate equal to 

^2s/(l+2s)^l/{l+2s)^-s/{l+2s)^ 

which is the usual minimax rate, without the log term for payment for adaptation. Actually, 
this is a "toy" example since we have asked for more regularity than in the Holder regularity. 
Note that in the degenerate design case, when a and 7 are such that a = 1 + 7(1 + (5)/s 
there is again no extra log factor. 


4. Optimality 


4.1. Payment for adaptation. The convergence rate of a linear estimator with an adaptive 
bandwidth choice can be well explained with a balance equation between its bias and variance 
terms. In our context this equation is 


ui{h) = 


a 


(see lemma ^ and a deterministic counterpart of this equilibrium is 

uj{h) = 


a 


\j2njQ u{t)dt 


(4.1) 


see lemma ISl We proved in Gai'ffas (2004) that the minimax rate ^pn,uJ over ^^^(xojCc) is 
given by 

'4^n,uj — ^('yn,uj)j (^- 2 ) 

where 7 ^^^ is the smallest solution to (inj. In a model with homogeneous information 
(the white noise or the regression model with an equidistant design) we know that such 
a balance equation cannot be realized: an adaptive estimator to the unknown smoothness 
without loss of efficiency is not possible for pointwise estimation, even if we know that the 
function belongs to one of two Holder classes, see Lepski (1990), Brown and Low (1996) and 
Lepski and Spokoiny (1997) . This means that local adaptation cannot be achieved for free: 
we have to pay an extra log factor in the convergence rate, at least of order (logn) 2 ^/T+ 2 s) 
when estimating a Holder function with smoothness s. The authors call this phenomenon 
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payment for adaptation. We intend here to generalise this resnlt to the regression with a 
degenerate random design. 

4.2. SuperefRciency. Let s, r' < r, be positive and <5 ^ 1, p > 1. We take a;(/i) = 
uj'{h) = r'h^ and the minimax rate defined by 114.211 . In view of lemmaElwe have 

i^n,L 0 ~ as n ^ +oo. (4.3) 

We recall that in view of theorem O the "adaptive" rate rn,Lo defined by (EH) is attained by 
the adaptive procednre fn{xo) simultaneously over several classes S 5 ^(xo, cu) with tv G RV(s) 
for any regnlarity s G (0, ac + 1] and that 

rn,ij ~ c^,/ 3 ,< 7 ,r(logn/n)^/(^+^^+^) 4 ,^(logn/n) as n ^ + 00 . (4.4) 

Theorem 3. If an estimator fn based on jni) is asymptotically minimax over J-s{xo,uj), 
that is 

limsup sup E”^^{|^(xo) - /(^o)!^} < + 00 , 

" f£j^s{xo,u}) 

and if this estimator is superefficient at a function /o G J-s{xo,lo') in the sense that there is 
7 > 0 such that 

limsupE^^^^{|^(a:o) - Mxo)^ < + 00 , (4.5) 

n 

then we can find a function fi G !Fs{xo:^) such that 

lminfr“P ^%,f,{\fn{xo) - /i(xo)|^} > 0 . 

This theorem is a generalisation of a result by Brown and Low (1996) for the degenerate 
random design case. Of course, when the design is non-degenerate (0 < /u(xo) < + 00 ) the 
theorem remains valid and the result is barely the same as in Brown and Low (1996) with 
the same rates. 

The theorem El is a lower bound for a superefhcient estimator. Actually, the most inter¬ 
esting result for our problem is the next corollary. 

4.3. An adaptive lower bound. Let 0 < r 2 < n < +00 and 0 < si < S 2 < +00 be such 

that [sij = [S 2 J = k. If uji{h) = rih^^ we denote iFi = iFs{xo,uJi). Let ifn,i be the minimax 
rate de fin ed by EH over for z = 1,2 and be de fin ed by EH with uj = uJi (the 

"adaptive" rate when the class is iFi). Note that 'ipn,i satisfies (14.311 with s = Si and 
satisfies EH with s = si. 

Corollary 1. If an estimator fn is asymptotically minimax over T\ and IF 2 , that is for 
i = l, 2 : 

limsup sup K'f^^{\fn{xo) - /(xq)!^} < + 00 , (4.6) 

” / e.4 

then this estimator also satisfies 

liminf sup E^ {|/„(xo) - /(xo)^} > 0. (4.7) 

" feJ^i 

Note that (14.711 contradicts EH for i = 1 since lim.n' 4 ’n,i/xn,i = 0 , thus there is no 
pointwise minimax adaptive estimator over two such classes T\ and T 2 and the 
best achievable rate is rn,i- The corollary Q is an immediate consequence of theorem El 
Clearly, T 2 C T\ thus equation EH entails that /„ is superefhcient at any function /q G 
More precisely, fn satisfies (14.511 with 7 = 2 {i+ 2 si+ 0 ){i+ 2 l 2 +ft) ^ ^ since n“’^t'(l/n) —> 0 
where I = ^ ^ RV(0). 
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5. Simulations 

5.1. Implementation of the procedure. For the estimation at a point x, the procedure 
itOl selects the best symmetrical interval I = [x — h, x + h] among several h in the grid Ti. 
We have implemented this procedure with non symmetrical intervals, which is a procedure 
similar to the one in Spokoiny (1998). First, we define similarly to section im for any 
I C [0,1] the scalar product 

{f ,9)I = Y. 

XiGl 

(it is convenient in this part to remove the normalisation term from the definition of the 
scalar product) and similarly to 112.111 we define the matrix X/ with entries (X/)jy = {(j)j , 4’i)j 
for 0 ^ j,l ^ K. We define in the same way Y/, and 0/ is dehned as the solution to 

Xj9 = Yi. 

Note that if J C [0,1], the vector F/^j with coordinates 

(Pi,j)j = {fi,K - Ij,k , (t>j)j/\\(l^j\\j 

(for 0 ^ j ^ k) satishes 

F/,j = Hj(0/-0j), 

where Hj is dehned as the matrix with entries for 0 ^ j,l ^ k 

= I 

The main steps of the procedure for the estimation at a point x are then: 

(1) choose parameters a > 1, «: G N and m ^ k + 1 , 

(2) sort the (Xi,Yi) in (X(j), Y(j)) such that X(j) ^ X(j+i), 

(3) hnd j such that x G [X(j),X q_|_i)] and G [Xq),= m, 

(4) build 

[loga(i + l)] [loga(’T’-i)] 

= IJ g+ = IJ {XQ+[„i])}, 

2=0 2 = 0 

(5) compute Oj and H/ for all I G ^ x , 

(6) ifX „,7 4#{W|W G/}, hnd 

I = aigmax \^Nnj such that VJ G I,J € G, ||Hj(0/ — 0j)||oo ^ 
i&g 

where || • ||oo stands for the sup norm in and 

Tij = a(l + Vn + l)^/\ogNn,I + ■\/{l + a)^J {Nn,j/Nn,i)\ogn, 

with a for instance given by itOll . 

(7) return the hrst coordinate of Oj. 

This procedure uses a geometrical grid, thus it is computationally feasible for reasonable 
choices of a (a = 1.05 is used in the next section). The main steps of the procedure with an 
arithmetical grid are the same with a modihcation of the threshold, see The procedure 

is implemented in C++ and is quite fast: it takes few seconds to recover the whole function 
at 300 points on a modern computer. 
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5.2. Numerical illustrations. We use for our simulations the target functions from Donoho 
and Johnstone (1994). These functions are commonly used as benchmarks for adaptive es¬ 
timators. We show in hgure ^ the target functions and datasets with a uniform random 
design. The noise is Gaussian with a chosen to have (root) signal-to-noise ratio 7. The 
sample size is n = 2000. We show the estimates in hgure El For all estimates we take k = 2, 
a = 1.05 and m = 25. We estimate at each point x = j/300 with j = 0,, 300. 



Figure 1. Blocks, bumps, heavysine and doppler with Gaussian noise and 
uniform design. 






Figure 2. Estimates based on the datasets in hgnre^ 
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Note that these estimates can be slightly improved with case by case tuned parameters: 
for instance, for the first dataset (blocks), the choice k = 0 gives a slightly better looking 
estimate (the target function is constant by parts). In figure (HI we show datasets with the 
same signal-to-noise ratio and sample size as in figure ^ but the design is non-uniform (we 
plot the design density on each of them). We show the estimates based on these datasets in 
figure 0 The same parameters as for figure El are used. 





Figure 3. Blocks, bumps, heavysine and doppler with Gaussian noise and 
non-uniform design. 



Figure 4. Estimates based on the datasets in figure El 
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In figures El and ini we give a more localised illnstration of the heavysine dataset. We keep 
the same signal-to-noise ratio and sample size. We consider the design density 

/3 + 1 


ix{x) = 


- a:o|^l[o,i](^)> 


./ 5+1 


+ (1 - 


(5.1) 


for xq = 0.2,0.72 and (5 = -0.5,1. 



Figure 5. Heavysine datasets and estimates with design density inj with 
xq = 0.2 and /3 = —0.5 at top, /3 = 1 at bottom. 



Figure 6 . Heavysine datasets and estimates with design density with 
xq = 0.72 and (5 = —0.5 at top, /3 = 1 at bottom. 
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6. Discussion 


6.1. On the procedure. 

• It is important to note that on the event flh the estimator fh,K is equal to the classical 
local polynomial estimator dehned by 

fh,K = arg min \\g - F|||, (6.1) 

where 14 = Span{((/)j)j=o ,^ necessary condition for fh,K to minimise 116.111 is 
to be solution of the linear problem 

hnd / e 14 such that V(/) G 14, (/, 4>)h = {Y , (j))h. (6.2) 

The main idea of the procedure is the following: if /i is a good bandwidth, then for 
any h' ^ h and for all (j) £ we should have in view of itOl : 

ifh- fh', 4>)h' = ifh-Y, (j))h' ~ , (l))h', 

which means that the difference fh—fh' is mainly noise, in the sense that a~^ ifh— 
fh ') <P)h' is close in law to a standard Gaussian. 

• The procedure IZSI) looks like the Lepski procedure: in a model where the estima¬ 
tors can be well sorted by their respective variances (this is the case with kernel 
estimators in the white noise model, see Lepski and Spokoiny (1997)), the Lepski 
procedure selects the largest bandwidth such that the corresponding estimator does 
not differ significantly from estimators with a smaller bandwidth. Here the idea is 
the same, but the proposed procedure is additionally sensitive to the design. 

• The estimator fn{xo) only depends on k and on the grid H (to be chosen by the 
statistician). It does not depend on the regularity of / nor any assumption on fi. In 
this sense, this estimator is adaptive in both regularity and design. 

• Note that = ^F^Fh where F/^ is the matrix of size n x (k -|- 1) with entries 

{Fh)i,j = {Xi — for 0 4 ^ 4 and 0 4 J 4 and that kerX/i = kerF/j. Thus 
when n < K-b 1, X^ is not invertible since its kernel is not zero, and Vlh = 0- This is 
the reason why theorem ^ is stated for n 4 + 1 and in the step 3 of the procedure 

(see section OTi we must take k + 1 so that each interval in Q contains at least 
«: -b 1 observations Xj. 

• The reason why we need to take the grid Ti = in theorem |21 is linked with the 

control of \n,u- We can prove the theorem with a geometrical grid if we additionally 
assume \n,u] > A for A > 0, but we preferred to work only under the regularly vary¬ 
ing design assumption with a restricted grid choice without extra assumption on the 
model. 

• The fact that the noise level cr is known is of little importance. If it is unknown 
we can plug-in some estimator in place of cr^. Following Gasser et al. (1986) or 
Buckley et al. (1988) we can consider 

^ n—1 

= 2(n — 1) ~ ^(*)) ’ 


(6.3) 
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where is the observation at the point where ^ X( 2 ) ^ ^ ^(n) ■ 

6.2. Comparison with previous results. 

• In Guerre (1999), for the estimation of the regression function at the point 0 in a 

more general setup for the design, the author works conditionally on Xn and gives 
an upper bound with a data-driven rate similar to The author considers then 

as an example the case of an i.i.d. design with density /r such that /i(x) ~ close 
to 0 for /3 > — 1 , which is a particular case of regularly varying density at 0 of index 
p. Here the approach is the same: under the regular variation assumption we derive 
from theorem n an asymptotic upper-bound with a deterministic rate (theorem [SJ . 

• Bandwidth selection procedures in local polynomial estimation can be found in Fan 
and Gijbels (1995), Goldenshluger and Nemirovski (1997) or Spokoiny (1998). In this 
last paper the author is interested in the regression function estimation near a change 
point. The main idea and difference between the work by Spokoiny (1998) and the 
previous work by Goldenshluger and Nemirovski (1997) is to solve the linear problem 
(EH in a non symmetrical neighbourhood of xq not containing the change point. Our 
adaptive procedure is mainly inspired from the work of Spokoiny and adapted 
for the degenerate random design problem. We have also made improvements, for 
instance we do not need to bound the estimator and the function at xq by some 
known constant. 


7. Proofs 

In the following we denote by Pk,h the projection in the space 14 for the scalar product 
(■, ph- We denote respectively by (•, •) and by || • || the Euclidean scalar product and 
the Euclidean norm in We denote by || • ||oo the sup norm in We define 

Cl = (1, 0 ,..., 0 ), the first canonical basis vector in 

7.1. Preparatory results and proof of theorem ^ The next lemma is a version of the 
local polynomial estimator bias-variance decomposition, which is classical: see Gleveland 
(1979), Tsybakov (1986), Korostelev and Tsybakov (1993), Pan and Gijbels (1995, 1996), 
Goldenshluger and Nemirovski (1997), Spokoiny (1998) and Tsybakov (2003), among others. 
The version given by lemma Q is close to the one in Spokoiny (1998). Let us introduce for 
any positive integer k the continuity modulus 

LOf^k{xo,h)= inf sup |/(x) - P(x - xo)|. 

lx —Xol^/l 

Note that if ki ^ k 2 we clearly have ujf^k 2 {xo,h) ^ tof^kiixo, h). 

Lemma 1 (Bias variance decomposition). On the event ith the estimator from definition 
Q satisfies for any k ^ k 

\fh{xo) - f{xo)\ < >^~^{GhWK + l{ujf,k{xo,h) + (7.1) 

where is, conditional on centered Gaussian such that KJ^{'y‘^\Xn} 4 1- 

Proof. On Glk we have X/j = X/j and A(X/j) > ^ > 0, then X/j is invertible. Since A/j 

is clearly invertible on this event, Qh is also invertible. Let 0 < e ^ By definition of 
x’f,K{xo, h) we can find a polynomial G Pk such that 

sup |/(x) - Plh(x)l ^ h) + 

x£[xo—h,xo+h] 


e 
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In particular we have \ f{xo) — Pj i^{xo)\ ^ and if we denote by 9h the coefficients vector 
of Pj then 

IfhA^o) - fixo)\ ^- Oh ), ei)| + ^ = \{gi;^Ah^h{0h - Oh), ei)| + 

V n yTi 

Then in view of D one has for j = 0 ,..., k: 

(Xhih - 0h))j = {fh,. - Plh , ^j)h = {Y- Pl^ , cl^j)h = {f- Plh , cPj)h + (e, ^j)h, 

thus we can decompose 'KhiOh — Oh) — Bh PYh and then: 

IfhA^o) - fixo)\ <\{Gi:^AhBh , ei)| + liGj^^AhVh , ei}\ + ^ ^ A + B + 

^/n y n 

We have 

^ ^ WG^^AhBhW < \\gj;^\\\\AhBh\\ < \\g^^\\V^\\AhBh\\oo, 

and 

\{AhBh)j\ = UjWh^lif - Plh, ^ \\f-Pf,h\\h ^ujf,n{xo,h) + ^. 

For any symmetrical and positive matrix M we have X~^{M) = ||M“^|| then since ||A^^|| ^ 1 
we have on the event Glh- 

II5»‘II = IIai'x;‘a;'|| < ||x;>|| = A-'(Xft) < 

Thus A < llGh^Wf^ + l‘^/,K(a;o, h) + eyjn + l ^ \\gJ^^\\^JlP^P^ulf^k{xo,h) + ey/n + l since 
k ^ K. Conditional on the random vector Vh is centered Gaussian with covariance 
matrix a‘^N~lX^h- Thus g^^AhVh is again centered Gaussian, with covariance matrix 

^^Kjfih^^hy^hAhg-h^ = a^N-j^g-\ 
and B is then centered Gaussian with variance 

Since Gh is positive symmetrical and its entries are smaller than one in absolute value we 
get ||^■^|| = and X^Gh) = inf||,j,||=i(x, Ghx) ^ \\Ghei\\ ^ + 1. Thns \\GJ^^\\ ^ 

\/k+ IP, and the proposition follows. □ 

Let us introduce the events 

•Ah' ,h,j {|(//i,K fh',K, 4^j)h'\ ^ ) 

Ah',h — C\j=o-Ah',h,j and Ah — Clh'&'Hh'^^'A' following lemma shows that if some 

bandwidth h is good in the sense that h ^ Hn,uj {h is smaller than the ideal adaptive 
bandwidth) then h can be selected by the procedure with a large probability. 

Lemma 2. Let f G Ps{xo,uj) foruj G RV(s) with 0 < s ^ At+1. Ifh is such that h ^ Hn,u}A6 
we have on Glh for any re ^ k + 1: 
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Proof. Let j G {0, ..., k} and h' G Tih- On we have in view of (Ih.lH that 
thus nsing (lOl we can decompose: 

{fh',K ~ fh,K ) 4‘j)h' — {Y — fh,K ) 4‘j)h' — if ~ fh,K j 4'j)h' “h (C ) 4'j)h' 

= if - Pk,/i(/) , (Pj)h' + (Pk,/i(/) - fh,K , 4>j)h' + if. , 4>j)h' 
= {f- P.,h{f) . ^j)h' + {F.Mf - y) > 

= if - Pk,/i(/) , 4>j)h’ - {PkMO > 4)j)h' + if ., 4>j)h' 

= A + B + C. 


The term ^ is a bias term. By the definition of ujf^kixo, h) we can find a polynomial Pjj^ G 14 
such that 


sup \f{x) - Pf^hix)\ ^ UJf,k{xo, h) +£n, 

x£[xo — h,xo+h] 


where £n — (12.41) 1. Since h' ^ h ^ 6, f e Ts{xq,uj) and PJj^ G 14 C 14 

we get 


1^1 ^ 11/ - p.,h{f)\\h'\m\h' ^ 11/ - pf,h - p.,h(/ - Pf,h)\\hm\h' 

^\\f-Pf,h\\hm\h' 




||/i'(^/,A:(^0) h) T ^r. 




\h/{uj{h) + £r. 


since P^^/i is a projection with respect to (•, ■)h. If h < Hn^to we have in view of (toil 
that ui{h) 4 When h = two cases can occur. If the graphs of h 

N~j^ logn and h i—> w(h) cross each other we have ijj{h) = fTy^logn. When these 
graphs do not cross we introduce = max{/i G |/i < Lfn.w} and = min{/i G 7i\h ^ 
iLn.oj}- If H we have 4 ^n,iP+„ ^ ^ we 

get 4 4 (1 + Then for any h 4 i?n,a;: 


1^1 < 


- Q) ^logn + £n) 
ll</i lk'(o'Y^(l + a)iV“^ logn + £n) 


ifw = wf°“. 


(7.2) 


Conditional on B and C are centered Gaussian. We have C{C\Xn) = A7(0, 

and conditional on X„ the vector PK,fe(^) is centered Gaussian with covariance matrix 

fX^PK,hPK,h = O-^Pk,?!! 


since P^^h is a projection. Thus B is centered Gaussian with variance 

E’^,^{(p.,h(0, /i)^'|Tn} ^ !!</>,-11^, 

= ^n':Ml/>ii'tr(Var(P,,,(0|XO) 
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where we last used that Pk,/i is the projection in I 4 . Then conditional on Xn, -B + C is 
centered Gaussian with variance 


+ Cf\Xn} ^ + ‘iBC + C^\Xn} 

4 Kl^{B^\Xn} + 2y/E-^{B2|X„}E-^{C2|X„} + El^{C^\Xn} 


^ 0-^(1 + \/k + 

Using m and since 2 ^ Nn,h ^ n on Qh we have 

\B + C\ 


||2 ^2 


A 


h',h,j 


,• C 




- 1 / 2 , 


Il'Pj II/i'C'k 


> 




and using a standard Gaussian large deviation inequality we get 
nMh',h,j\^n] ^ exp(-(l + 2p) logNn,h) = 
Since #(Ti.h) ^ we finally have 


nMh\^n} ^ U U ^ (k + l)iV; 


-2p 

h ‘ 


h'enHj=o 


□ 


Lemma 3. Let h G TL and h' G Tih- On the event Llfi' H Ah',h one has: 
\fhixo) - fh'{xo)\ ^ 


where T Op^ . 

Proof. In view of definition ^ and since Qh' is invertible on we have 

IMxo) - fh'(xo)l = l(AfM0h - M , ei)| ^ WAf^Oh - M\\ 

= \\g-^Ah'Xh>{dh-M\\ 

= \\Gh^Ah'Dh',h\\ ^ WG^^IIVk + l||A/i/Z)/i/^/i||oo 


On Ah',h we have for any j G {0,..., k}: 


^fi')j\ — \{fh fh' ) Gj)h'\ ^ 

thus \\AhiDy^h\\oo ^ crTn^h',h- Since h' ^ h and Nn^h ^ n we have when LL 

T^n,h',h ^ {C k\/ Op + T u )'\/ Lln,h' iog Tl, 

and when H = we have by construction Nn^h ^ 1 + a thus (A^n,h — o 

and holds again. 


^geom 


(7.3) 


1-1 


^ (1 + a)N^ 


-1 


Lemma 4. For any p,a > 0 and 0 < h' ^ h ^ 1 the estimator fh' given by definition^ 
satisfies: 

sup E’;,^{|A,(^o)r|:^n} ^ a,p,K(« V 
f GU{a) 

where C(j,p,K = (k + /«+(! + otY e'^p{—tf I2)dt. 


□ V 
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Proof. If Nn^h' = 0 we have fh' = 0 and the result is obvious, thus we assume Nn^h' > 0- 
~ ^ y 2 ^ 

Since X(X.h') ^ > 0, X/j/ and A^i are invertible and also Qh'- Thus, 

fh'{xo) = , ei) = , ei) = AyYhi , ei). 


For any j € {0,..., k} we have {Ah'Yh')j = H'/'j ||;,/((/ > + (C , (t>j)h') = Bh'j + I4zj. 

Since f GU (a) we have 

\Bh',j\ ^ \\4>j\\h^\{f, 4>j)h'\ ^ ll/IU' < a, 

thus ll-B/i'lloo ^ OL. Since 14/ is, conditional on j£„, a centered Gaussian vector with variance 
^/Ahi^h'Ah' we have that Gf^^Ah'Vh' is also centered Gaussian, with variance 


-1 




Ah'Xh^Ah^g-,^ = a^Nr^ 


AT^X-^ 


n,h' h' h' 


X,zX-/A-/. 


The variable {Qj^fVh ', ei) is then conditional on An centered Gaussian with variance 




h’ 


IX 


h’ 


IX 


h'\ 


and since clearly ||X/i/|| 4^ + 1, ||A^/|| 4 1 and ||X^/|| = A ^(X/j/) 4 Al^y we have 
v\, 4 f7^(K + 1) and 11^4^11 ^ l|Aft/||||X^/||||A^/|| 4 Finally we have 

IA'(^o)| 4 \{Qh^Bh ', ei)| + \{Qf;^Vh ,, ei)| 4 \\Qf,^\\{\\Bh'\\ + aV^\lh'\) 

4 Vk + liV^|^^^,(||5/i/||oo +cr| 7 /j/|) 

4 \/k + l(a V (1 + o-| 7 ,i/|), 

where 7 / 4 / is, conditional on An, centered Gaussian with variance 4 1- The le mm a follows 
by integrating with respect to Pj^(-|X„). □ 


Proof of theorem^ We first work on the event {Hn < Hn^j}- de fini tion of we have 
{Hn < C A^jj* . Uniformly for / G U{a) we have using the lemmas O and @1 

E]jR-pfn(xo) - f(xo)n^^^^^ jAn} 


4 (2P V l)i24(^y'E-^{|/5jxo)pp|T„} + |/(xo)r j JX„} 

4 (2^ V l)c7-^(a V 1)^(VC^ + l)v/;^(log = o^l). 

Now we work on the event {ffn,uj ^ Bn}- By definition of Hn we have {H*^^ 4 Hn} C 


A 




and using lemma [31 we get on Qh* ■ 


\\Bn,ui- 


(7.4) 


IfnS^o) - fH*,Axo)\ 4 Cp,n,a\\^i ^ 

Since s 4 «^ + 1 we have /c = [sj 4 k and ujf^n,ixo, h) 4 ujj^kixo, h). In view of lemma Hand 
since / G Rh* ^{xo,uj) one has on Qh*^' 

\fH*Jxo) - f{xo)\ 4 WGhI^ IIVk + l(a;(i7;,^) + 
where 7 / 7 *^ is, conditional on An, centered Gaussian with \An} 4 1- When 

Bnu: w 0 1 i3jV0 to(^1 ^^ lo^ Ti. WI 1011 u w 0 pro 000 cl &iS m th 0 

proof of l 0 mmasI 21 and[S|to provo that 

^{Bn,J ^ ^yJi^ + a)N-^logn, 
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(7.5) 


in both cases H = or H = Wf Then 

IfH^Jxo) - /(xo)| ^ Rn,Lo\\GHl^^ || + 1 (Vl + a + hH*j)- 

Finally, the inequalities and Hi together entail: 

Kllfnixo) - fixo)\iH*^<iHr.,^ ^ + 1(\/1 + a + 

and the result follows by integration with respect to Pj^(-|X„). □ 

7.2. Preparatory results and proof of theorem O Let us denote by P)) the joint prob¬ 
ability of the variables We dehne Fy{h) = Jq u{t)dt. 

Lemma 5. If ^ 77(xo, /3) one has for any e,h > 0: 


Ve > 0, 


N, 


n,h 


2nF^{h) 


- 1 


> e| ^ 2exp^— 


1 -|- e/3 


nF. 


Proof. It suffices to use the Bernstein inequality to the sum of independent random variables 
Zi = I\Xi-xo\i:h -- xo\ </i} fori = l,...,n. □ 

Lemma 6. If ^ TZ{xo,f3) for (3 > —1, u G RV(s) for s > 0 and {hn,uj) is defined by 113.till 

then rn^Lo = ^{hn,Lu) satisfies 

rn,uj ~ as n ^ + 00 , (7.6) 

where is .slowly varying and When oj{h) = rh^ [Holder regularity) 

for r > 0 we have more precisely: 

Xn,u} ~ as n ^ + 00 , (7.7) 

where £s,u is again slowly varying. 

Proof. Let us dehne G{h) = uP[h)Fi,[h). Since /3 > —1 we have Fy G RV(/3 + l) (see appen¬ 
dix) and G G RV(l-|-2s + /3). The function G is continuous and such that lim^j^ 0 + G[h) = 0 

in view of (US, since 1 + 2s + (3 > 0. Then for n large enough hn is given by /i„ = 

G^[a‘^ logn/2n) where G^[h) = inf{y ^ 0|G(i/) ^ h} is the generalised inverse of G. Since 
G^ G RV(1/(1 -|- 2s -|- /?)) (see appendix) we have to o G^ G RV(s/(l -|- 2s -|- /?)) and we can 
write u) o G^ = where is slowly varying. Thus 

.21 


r„ = uj o G^ ( a 




~ C 




as n ^ -foo, 


since is slowly varying. When a;(/i) = rh^ we can write more precisely hn = G^ 
where G{h) = h?‘^Fy[h), so 117.611 and 117.711 follow. □ 

Let us introduce the following notations: if a G N and h > 0 we dehne 

'Xi - Xq' 


Nn,h,a = Y. 

I ^01 


h 


Note that Nn,h,o = Nn,h- For e > 0, we define the event: 


^n,h,a,e — 

where G^^p is given in section HOI 




n^h^OL 


nFy{h) 


-G, 


a,/3 


< e}, 
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Lemma 7. For any a G N, e > 0 and if jJ. € TZ{xQ,f3) we have for any positive sequence 
{in) going to 0 and when n is large enough: 

<r2 


^ 2 exp(- 


-uF^ilrS)- 


8(2 + e/3) 

Proof. Let us define Qi^n,a = = Qi,n,a - FF^{Qi,n,a]- Since 

pL G TZ{xo, 13) one has for n such that [xq — in, xq + in] C W and i G {1,... , re}: 

E“{(3,,..4 = (1 + (-1).) -tn*lhn) SlP*H,{t)dt 


where £j/(h) = h~^v{h) is slowly varying (see appendix) and in view of IIA.311 we have 


lim 


1 


n^+oo F^{ln) ^ 

Then for re large enough one has: 


K{Q^,n,a} = 


{I 


^n,-fn,a (-1 I ^ ^ 
nFy{in) 




nFy{in) ^ 




(7.9) 


We have E{^{Zi^n,a} = 0, \Zpn,a\ ^ 2 . Since 


4 ^ ^ 2nFM, 


2 = 1 


and the Zi^n,a are independent we can apply Bernstein inequality. If = ^nF,^{in), 117.911 
and Bernstein inequality entail: 


HDn,7„,a,J ^ 2exp(^; 


-TP 


n, 7 .,a,Ej - 2(62 „ + 2 r„/ 3 ) 

Let us introduce for e > 0 the event 


^ 2 exp ( — 


8(2 +e/3) 


nFy{in)'^. 


□ 


Cn,£: — {(1 ^ (1 H“ 

where is given by inm . 

Lemma 8 . If u) ^ RV(s) for s > 0 then for any 0 < £2 ^ 1/2 there exists 0 < £3 ^ £2 such 
that for re large enough 

^n,{l-£2)hn,uj,0,e3 ^ lln,(l+e2)/in,Q^,0,£3 Cn,£ 2 - 

Proof. By the dehnition iTOll of Hn^uj we have 

{Hn,LO ^ (1 +£ 2 )^ 71 , 07 } = {Nn,(l+e 2 )h„,u, > U}~‘^ {{1 + 62 )hn,uj) log u}. 

It is clear that £3 = 1 — (1 — £|)“^(1 + £ 2 )”^® A £2 > 0 for £2 small enough. We recall that 
iuj stands for the slow term of uj (see dehnition El. Since (EU uniformly over each compact 
set in (0, + 00 ) we have when re is large enough that for any y G |]: 

(1 - el)luj{hn,uj) < L{yhn,uj) ^ (1 + £l)iu:{hn,uj), (7.10) 

so (17.1 oil with y = l + £(£^l/2) entails in view of (13.011 and since Fi, is increasing: 

2(1 - £ 3 )reFi.((l +£ 2 )h„,^) ^ (l-£ 2 )“^(l+£ 2 )“^V^(U“^(/ln,cj)logre 

= f7^((l + £2)V‘-')~^^(1 - ^2)“^C^(^n,a;) log re 
^ a‘^uj~‘^{{l + e 2 )hn,uf) log re. 
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Thus 

{^n,{l+e2)hn,ui ^ T £^ 2 )^ 71 ,oj)} C {i?n,aj ^ (1 T £' 2 )^ 71 , 07 }) 

and similarly on the other side we have for n large enough 

{^n,{l-e2)hn,ui ^ +£3)^-^7^((1 ”£ 2 )^ 71 , 07 )} C {(1 — £2)hn,LO < ^n,Lj}j 

thus the lemma. □ 

Let us denote Gn — Gh„, uj introduce the events A„_£ = {|A(^,i) — Xk,i 3 \ ^ £} for e > 0 
and for a G N 

1 ^ /Aj-xoN« 


B, 




I nF^iK, 

\^i ^0 I 

Lemma 9. If uj G RV(s) for s > 0 and // G F{xo,(5) for (5 > —1 we can find for any 
0 < e ^ ^ an event An,e £ such that for n large enough 

“^77,£ C n Bjj^o,£ Li Cn^ei 

and 

^ 4(k + 2 ) exp ( - . ( 7 . 12 ) 

Proof. Using the fact that A(M) = inf||a,||=i(x, Mx) for any symmetrical matrix M and 
since Gn and G are symmetrical we get 

2k, 


n - iG)j,i\ 




a=0 


(1 + k)' 


r I" U An^e- 


Since | iG)j,i | ^ 1 we can find easily 0 < ei ^ e snch that for any 0 ^ j,l ^ k 


B. 


i,j+Z,£i L Bjj 2j,£i L Bn,j2i,£i U " 1 ^ I (^n)j,Z 








and then 


2k 


q;,£i ^ ^n,£- 


a =0 

We define £2 — 5 f^£i £3 such that — = l+e 2 . Since h 1 —> Nn,h is increasing 

we have 

Cn,£3 C {A^r7,(l —£3)h„ ^ IIn,Hn ^ -^71,(l+£3)/ln} 7 

and using lemma |H 1 we can find £4 ^ £3 such that 

B 7 i,(l-£ 3 )h„, 0,£4 ^^n,{l+e 3 )hnfi,£i F Cn, 63 - 

In view of iSU and since £i,(h) = Fn{h)h is slowly varying we have for n large enough 

and any 0 < £3 ^ 1/2 

4((1 + £z)hn) ^ (1 + £z)Iu{hn) and 4((1 - £z)hn) ^ (1 - £3)4(/in), (7.13) 

thus 


B7i,(l-e3)/i„,o,£4 L Q n 0,£3 F Fn ,62 i A/" ^ ^ £2'|’ 


and on IXn^(^i—e 3 )h„,o ,64 L Hn^(^i-{-£ 3 'jh„,o,e 4 L Hn^h„, 0,63 have 


nFy{hr. 


E 




Xj - Xq 
hr). 


-N) 


n,h„,a 


€ 


Hn V hn\'^ A^n,h„ 


hn ^ "IT-Firi^hn^ 


N) 


n,H„ 


N, 


- 1 


n,h„ 


^ (1 + £3)°^(2 + £3)£2 ^ £ 1 / 2 , 
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since £3 ^ 1/2. Then we have since £4 ^ £3 ^ £2 ^ ^ 

I^n,(l-e3)Ii„,0,£4 ^ I^n,(l+e3)/i„,0,£4 ^ ^n,h„,0,64, ^ T^n,hn,a,e4 C ^n,a,ei^ 

and finally 


2k; 

‘^n,£ = n n Dn,/in, 0,£4 ^ C A.n^e ^n.e-) 

0=0 

thus (17.1 1 1I . Using lemma 0 we obtain easily in view of (IZ3SI and (EH) for n large enongh 

^ 4(K + 2)exp (4/^up^y^2-(^+2)cjV-2iogn), 
thus (TTra and the lemma follows. □ 


Proof of theorem\^ Since H = we have i/n,w = H*^u} ^n,uj = We can 

assume without generality loss that £ = £» — We consider the event An,e 

from lemma ini Clearly, we have for n large enough An,e U and J^gh„^^{xo,uj) C 

In view of (17.1 IK and theorem ^ we have nniformly for / G S: 


^],t,{'rn^\fnixo) - fixoW^AnJ ^ (1 

^ (1 


- e)-P/^E]jR-P\l{xo) - f{xo)\nn,J 
-e)-P/^ci{X^,p-e)-P{l + Onil)). 


Now we work on the complementary Af^^. Using lemma^and equation (j7.1211 we get since 
f eU{a) and ^ n: 

ElJr-nUxo) - /{x„)|f l45„) < (2!> V i)rZ’’(sJwjf{\UAm + cf) frfipj 

< (25 V l)(a V 

thus we have proved EH and EH follows from lemma El □ 


7.3. Computation of the example. 

Lemma 10. Let a G M and b> 0. If G{h) = /i^(log(l//i))“, then we have 

G^{h) ~ b^/^h^/\\og{l/h))-<^/^ ash^ 0+. 

The proof of this lemma can be fonnd in Gaiffas (2004). Using this lemma, we obtain 
that an eqnivalent of hn (see EH) is 

(1 + 2s + /3)(“+27)/(1+2£+/3) j'^^2/(1+2£+/3) ^ 

and since u){h) = rh^{\og{l/h))'^ we find that an equivalent of (up to a constant depending 
on s, /3,7, a) is (tTToli . 

7.4. Proof of the lower bound. The proof of theoremElis similar to the proof of theorem 
3 in Brown and Low (1996). It is based on the next theorem which can be found in Cai et 
al. (2004). This result is a general constrained risk inequality and is very useful for several 
statistical problems, for example super efficiency, adaptation and so on. 

Let p > 1 and q be such that ^ | = 1 and Af be a real random variable having 

distribution with density fe with respect to some measure m. The parameter 6 can take 
two valnes 9i or 02- We want to estimate 0 based on X. For any estimator 5 based on X 
we define its risk by 


i?p(5,0)^E4|,5(X)-0n. 
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We de fin e s{x) = fe 2 {^)/f 9 i{^) A = |02 — ^i|- Let 

4 = 4(01,02) = (E,,K(A)})'/''. 

Theorem 4 (Cai, Low and Zhao (2004)). If 5 is such that Rp{5,9i) ^ and if A > elq 
we have 

Rp{ 6 ,e 2 ) ^ (A - elqf ^ AP(l - ^). 

Proof of theorem\^ Since limsup^ — /o(3^o)|^} = C* < oo we have for 

n ^ N 

E/o,p{l/n(*o) - /o(xo)r} ^ 

Let <7 be fe times differentiable with support included in [—1,1], ^(0) > 0 and such that for 
any |x| ^ <5, — g^^\0)\ ^ k\\x\^~^. Such a function clearly exists. We de fin e 

fi{x) = fo{x) + {r-r)p^^g^^ 


Pn 


where pn is the smallest solution to 


rh^ = a\ 


' b log n 
2nFy{h) ’ 


where b = 2g^{p — 1)7 and g^o — sup^, | 5 '(a;)|. We clearly have /i G !Fs{xo,lij). Let Pq,P” 
be the joint laws of the observations (HU when respectively / = /o or / = /i. A sufficient 

A . dP?i 


statistic for {Pq,P”} is given by 4 i — log 


__o 

dPj* 

Vr, 


and 


AA(-Y,Un) under P;j, 

,AA(|,u.) 


under P^ 


where 


= ^ll/o - ^ ^ ~ fi{x)fp{x)dx ^ 2 {p - l) 7 logn. 


An easy computation gives 4 = exp(^E(|_J4^ ^ taking 8 n = fnixo), 02 = fiixo), 

01 = fo{xo) and e = fjn entails using theorem 0 ] 

i? 44 , 02 ) ^ [{r - r')plg{Q) - 2 Cfjnn-'^n'^Y ^ (r - r')V*V( 0 )(l - On(l)), 
since limnipn/Pn 0 , and the theorem follows. □ 


Appendix A. Some facts on regular variation 

We recall here briefly some results about regularly varying functions. The results stated 
in this section can be found in Senata (1976), Geluk and de Haan (1987) and Bingham et 
al. (1989). 

Let I be in all the following a slowly varying function. An important result is that the 
property 

lim i(yh)/£(h) = 1 (A.l) 

h^0+ 

actually holds uniformly for y in any compact set of (0, + 00 ). If 7? G RV(ai) and R G RV(a 2 ) 
we have 

• Ri X i ?2 £ RV(cii + 02), 

• Ryo R2 & RV(q:i X 02 ). 
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If i? G RV( 7 ) with 7 G M — {0} then as h ^ 0~^ we have 


R{h) 



if 7 > 0 , 
if 7 < 0 . 


(A.2) 


If 7 > —1, one has: 

~ (1 + 'y)~^£{h) as h ^ O’*", (A.3) 

and then h V£{t)dt is regularly varying of index 1 + 7 . This result is known as the 

Karamata theorem. If R is continuous we de fin e the generalised inverse as 



R^{y) = inf{/i ^ 0 such that R{h) ^ y}. 


If i? G RV( 7 ) for some 7 > 0 then there exists R G RV(l/ 7 ) such that 

R(R"(/i))~R-(R(/r))~/ias/r^0+, (A.4) 


and R is unique up to an asymptotic equivalence. Moreover, one version of R is R 
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